Two QSPR Methodologies, the Random, and the Super-Descriptors
نویسنده
چکیده
(Received April 21, 2010) ABSTRACT The full combinatorial search methodology has been used with a set of molecular connectivity indices plus five experimental parameters and the molar mass to extract the best descriptors for twelve properties of a set of organic solvents. The performance of the full combinatorial methodology is compared with the performance of the greedy search methodology obtained in a previous paper. The molecular connectivity indices used with both methodologies belong to different configurations as they can encode different hydrogen and core electron contributions and can give rise to configuration-dependent descriptors. The indices of the best configuration-dependent descriptors for the twelve properties, either molecular connectivity indices and/or experimental indices have then been pooled together and used to derive super-descriptors. These super-descriptors achieve, a better description for four properties, among which, an impressive description for the melting points. A thorough investigation has also been performed on the model quality of random indices, which have been used to derive either ‘zero-level’ descriptors, or semi-random descriptors. This has not only allowed to have a concrete idea of the model quality of random indices but also do draw interesting considerations about the quality of semi-random descriptors that is, descriptors based both on random numbers and on molecular connectivity indices and/or experimental parameters. In fact, a few properties can advantageously be described with this type of descriptors. This last investigation has allowed to better focus the validity of the q leave-oneout statistics and of the Topliss-Costello rule. On the other side, the full combinatorial technique, either with normal descriptors or with super-descriptors has shown the real limits of the greedy search algorithm, has confirmed previous conclusions about the contributions of the hydrogen atoms, and has underlined the importance of pseudoconnectivity indices, experimental indices, and of some ad hoc parameters.
منابع مشابه
QSPR study on benzene derivatives to some physico-chemical properties by using topological indices
QSPR study on benzene derivatives have been made using recently introduced topological methodology. In this study the relationship between the Randic' (x'), Balaban (J), Szeged (Sz),Harary (H), Wiener (W), HyperWiener and Wiener Polarity (WP) to the thermal energy (Eth), heat capacity (CV) and entropy (S) of benzene derivatives is represented. Physicochemical properties are taken from the quant...
متن کاملQSPR models to predict thermodynamic properties of some mono and polycyclic aromatic hydrocarbons (PAHs) using GA-MLR
Quantitative Structure-Property Relationship (QSPR) models for modeling and predicting thermodynamic properties such as the enthalpy of vaporization at standard condition (ΔH˚vap kJ mol-1) and normal temperature of boiling points (T˚bp K) of 57 mono and Polycyclic Aromatic Hydrocarbons (PAHs) have been investigated. The PAHs were randomly separated into 2 groups: training and test sets. A set o...
متن کاملDetermination of critical properties of Alkanes derivatives using multiple linear regression
This study presents some mathematical methods for estimating the critical properties of 40 different types of alkanes and their derivatives including critical temperature, critical pressure and critical volume. This algorithm used QSPR modeling based on graph theory, several structural indices, and geometric descriptors of chemical compounds. Multiple linear regression was used to estimate the ...
متن کاملQuantitative Structure-Pproperty Relationship Modeling of the Redox Potential for Some Phenolic Antioxidants
In this work, quantitative structure-property relationship (QSPR) approaches were used to predict the redox potential of 42 phenolic antioxidants. The structures of all compounds optimized by the AM1 semi-empirical method and then a large number of molecular descriptors were calculated for each compound in the data set. Subsequently, stepwise multilinear regression was applied to select the mos...
متن کاملy-Randomization and Its Variants in QSPR/QSAR
y-Randomization is a tool used in validation of QSPR/QSAR models, whereby the performance of the original model in data description (r2) is compared to that of models built for permuted (randomly shuffled) response, based on the original descriptor pool and the original model building procedure. We compared y-randomization and several variants thereof, using original response, permuted response...
متن کامل